ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer
نویسندگان
چکیده
While morphological information has been demonstrated to be useful for various Chinese NLP tasks, there is still a lack of complete theories, category schemes, and toolkits for Chinese morphology. This paper focuses on the morphological structures of Chinese bi-character words, where a corpus were collected based on a welldefined morphological type scheme covering both Chinese derived words and compound words. With the corpus, a morphological analyzer is developed to classify Chinese bi-character words into the defined categories, which outperforms strong baselines and achieves about 66% macro F-measure for compound words, and effectively covers derived words.
منابع مشابه
Chinese Morphological Analysis with Character-level POS Tagging
The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters. However, existing methods have not yet fully utilized the potentials of Chinese characters. In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis. We propose the first tagset designed...
متن کاملPredicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches
This paper presented an overview of Chinese bi-character words’ morphological types, and proposed a set of features for machine learning approaches to predict these types based on composite characters’ information. First, eight morphological types were defined, and 6,500 Chinese bi-character words were annotated with these types. After pre-processing, 6,178 words were selected to construct a co...
متن کاملA Lexicon-Constrained Character Model for Chinese Morphological Analysis
This paper proposes a lexicon-constrained character model that combines both word and character features to solve complicated issues in Chinese morphological analysis. A Chinese character-based model constrained by a lexicon is built to acquire word building rules. Each character in a Chinese sentence is assigned a tag by the proposed model. The word segmentation and partof-speech tagging resul...
متن کاملCharacter Stream Parsing of Mixed-lingual Text
In multilingual countries text-to-speech synthesis systems often have to deal with sentences containing inclusions of multiple other languages in form of phrases, words or even parts of words. Such sentences can only be correctly processed using a system that incorporates a mixed-lingual morphological and syntactic analyzer. A prerequisite for such an analyzer is the correct identification of w...
متن کاملDesign of Chinese Morphological Analyzer
This is a pilot study which aims at the design of a Chinese morphological analyzer which is in state to predict the syntactic and semantic properties of nominal, verbal and adjectival compounds. Morphological structures of compound words contain the essential information of knowing their syntactic and semantic characteristics. In particular, morphological analysis is a primary step for predicti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015